Mini Challenge 2: Migrant Boats (geo-temporal analysis)
Authors and Affiliations:
Student Team: NO Tool(s): For the VAST competition, the analyses were performed primarily in the Palantir Government platform and to a lesser extent in GoogleEarth and the Palantir Finance platform. Both Palantir platforms are being developed by Palantir Technologies, based in Palo Alto, California. Palantir Technologies was founded in 2004 and works with customers across the Intelligence and Finance Communities.
The development team at Palantir made the decision early in the company’s history
to develop an analytic platform based on a foundation of openness; a trait
not often seen in the intelligence community. As old institutions transition
into a world where information is increasingly a commodity, the archaic
paradigms of locking down knowledge are giving way to an environment where
analysis is the real power. Palantir Technologies is able to liberate this power
in several concrete ways: The first is data integration - whether structured
or unstructured, Palantir provides standard and extensible interfaces for
bringing information into a common environment. The second is Search and
Discovery, whereby these disparate data stores can be explored as though they
were one. The third is Knowledge Management in which all the knowledge that
is discovered is treated like another data source so no analysis is lost. And
finally, the fourth is Collaboration whereby many analysts working together
can truly leverage their collective mind. Through our open APIs and numerous
(and multiplying) extensibility points, Palantir has succeeded in creating a
genuine platform for application-development and information-analysis.
Detailed Answer:
Palantir features
integration with the leaders in Geospatial Information Systems (GIS), including
Google Earth and ESRI’s ArcGIS. Palantir
Government 2.0, which will be released between this contest’s deadline
and the VAST symposium, focuses in part on increasing geospatial
capabilities. Even with our current level of integration, however, this
challenge is
exactly the kind of open-ended, large-scale, pattern-seeking analysis
that our
platform was designed for. Palantir has led to deep insights into the
geo-temporal migration data provided. We
began by transforming the original XML data into Palantir's open XML format (pXML) with a simple XSLT
in
order to feed the data into our GUI importer (figure 1). Extracting
properties and relationships from either structured or unstructured
datasets in Palantir requires minimal effort and time for even
non-technical users. We grouped the data into launch, landing, and
interdiction events with links to the passengers aboard each craft.
Whenever launch data was available, we linked the launch event to the
resulting landing or interdiction. To aid our temporal analysis, we
made the arbitrary assumption that "Go Fasts" launched a day before
interdiction/landing, rustics 2 days before, and rafts 3 days before. Figure 1: Data Import The flexibility of Palantir’s
platform allowed us to analyze the migration data from many dimensions. Our
team was able to investigate landing sites in relation to time, type of craft,
launch sites, and success rate. With a quick filter, we brought up all the
landing events in our database and displayed them in the Graph(figure
2.1). A tap of ‘Ctrl-A’ and ‘Ctrl-G’ is all it takes to export the events to
Google Earth (figure 2.2). Figure 2.1: All landing events in the Palantir Graph
Figure 2.2: The same
events in Google Earth We briefly explored the map and manipulated the time
window to get an overview of the situation. Immediately, we noticed that the
landings started around the Florida Keys, later spreading much farther north.
On closer inspection, we caught some landings on the Yucatan Peninsula of
Mexico. The landings were chunked into four zones: South / East / West Florida
and Mexico. We used the geography of Florida to create latitude/longitude zones
that would designate our landing regions: 26.15N marked the end of “South
Florida” and a longitude line designated the east-west border (-81.5W) (figure
3.1). We converted these coordinates into filters in Palantir and easily
divided the landing events into multiple groups (figure 3.2). Figures 3.1 and 3.2: Landing zones, geospatially (left), arranged in
Palantir (right)
Figures 4.1 and 4.2: Landings, one year time slice, grouped (left) and
one month, expanded (right) By scrolling
through the Timeline in Palantir (figures 4.1 and 4.2), manipulating data in
Google Earth, and doing basic statistical analysis on numbers from the
Histogram tool, we reconstructed the following history: Migrants began escaping
from Isla del Sueño to the nearby tip of Florida in early 2005, and it remained
a common landing site throughout the three year period. In early 2006, they
began landing in West Florida and this site gained popularity in mid 2006.
Mexico followed in mid 2006, becoming the third landing site that migrants
chose, but it quickly reached a high activity level despite the distance. In
fact, for the entire three year time period, the greatest proportion of
successful landings were on the Yucatan Peninsula—probably because U.S. Coast
Guards don’t patrol Mexican waters. Finally, in early 2007, migrants began to land
off the eastern coast of Florida; however, this site never reached the numbers
of landings seen in the other three zones. Overall, roughly 43% of successful
landings were in Mexico, 33% in South Florida, 17% in West Florida, and fewer
than 7% in East Florida. The landing sites could often be
characterized as hosting dense populations—perhaps density makes it easier to blend in or is simply not
a concern for migrants. 65% of all voyages employed rustic boats, 20% rafts, and 15% “Go Fasts” (percentages
reported are rounded). A relationship between boat choice and final destination is present but weak: rustics
were more popular among new West Floridians (70%) than South Floridians (61%); “Go Fasts” were somewhat
less common a choice for would-be-East Floridians (5% below average); and rafts were both unusually
unpopular for those heading to the West (8% below average) and unusually popular on the East (7% above average).
Boat-type choice had almost no effect on the overall landing success rate, which is extremely close to 48% for
all three vehicles. The less stable monthly success rates began low (25% or below), rose well above 60%, and
dropped to 0% again before reaching a relatively consistent level of 40-70% in the last year and a half (see the
chart in short answer 3). Trips to South Florida and Mexico produced 7% more fatality-free journeys than those to
West Florida. In sum, we revealed a huge number of approaches to analyzing landing sites simply by having Palantir
sort landing events into zones and looking into the resulting aggregates, conveniently
presented by the Histogram (figure 5). Figure 5: Histograms for landings in Mexico (left) and East Florida
(right) Our most interesting
analysis track viewed the landing data with respect to its associated launch
data. We created a new investigation in Palantir and added all launch events
(figure 6.1). Figure 6.1: All launch events in Palantir, divided by launch site
(picture labeled externally) After exporting these to Google Earth we noticed
that four very distinct portions of the island were being used for launches. On
closer inspection, we decided to divide the northwestern region into two sites,
as the coast juts out at one point and creates two distinct inlets (figure
6.2). Figure 6.2: The launch sites, numbered Using the same basic workflow as we used for the
landings, we created geographic filters in Palantir to divide the launch events
into groups. The launch data is incomplete but very interesting nonetheless.
The very first launch took place from site 5; and sites 1, 4, and 5 were the
most popular for the first two years. In the third year, activity exploded
across the launch sites. With this background information obtained, we asked
Palantir to do a link-by search and bring in all landing events that share an
“appears in” relationship with our launches.
Then we applied the filters we had created to designate landing zones
(figure 7) to see if there was any correlation between launch sites and landing
zones. Figure 7: All launches from site 5 that landed in South Florida Using the Histogram from this dataset, we uncovered
a rather strong link between launch site and the eventual landing site of
successful boats (figure 8). Figure 8: Relationship between launch site and destination
Importantly, even when we calculated this relationship in terms of all boats leaving the
site rather than only the successful ones, the site preferences remained the same. This
insight could be used to improve Coast Guard interdiction rates: if info suggests that a boat
has left from the southeast (site 5) of Isla del Sueño, they’re probably heading for Mexico;
however, if they’re leaving from its northernmost point (site 3), watch the eastern approach to
Florida. Moreover, the various launch sites have very different success rates, with boats from
site 4 being the most likely to land (58%) and those from site 3 the least likely (31%). With
the unique power of Palantir, this kind of complex, multi-dimensional analysis is not just
possible but intuitive and easy to perform.
We
also looked at the Coast guard ships that interdicted vessels from the
island. We found that the USS Ironwood had the most interdictions at
26, and the USS Bold Reef, had the least with 13 interdictions.
Figure 9: Coast Guard Vessels surrounded by their interdictions
Lastly,
we analyzed the rosters of the vessels. With the Histogram, we noticed
that some of the names appeared in multiple voyages, and 2 jumped out
at us: Jesus Vidro and Eduardo Catalano. They traveled together on two
interdicted voyages, before finally landing successfully.
Figure 10: Shared travel of Vidro and Catalano
Boat-2 Characterize the geographical patterns of interdiction over the three years.
We exported interdictions from Palantir to Google
Earth and created an axis using Lat x and Long y. With our geospatial animation,
we then constructed a four-stage progression out of Coast Guard interdictions:
encounters begin at Florida’s tip and north of Isla del Sueño, and through 2005 they spread horizontally and southward. In mid-2006
they gain a northward component, but this spreading pattern is slowed in early
2007 when interdictions shift southwest. Finally, interdictions push east to
surround the island later that year. We also noted the dissolution of several
“boundaries”: the first encounter to clear the island’s southern tip occurs in
May 2006. An encounter first crosses our “north” line in October 2005 on the
east but not until June 2006 on the
west. Throughout these stages, interdictions never reach as far north as landings
or cover Mexico (perhaps a jurisdictional problem)—a critical strategic
weaknesses to address. Figure 1: Interdictions during early 2005 Figure 2:
Interdictions grouped by boat with all boats during 2006’s peak period selected
(note: the close-up was interposed onto the picture)
Boat-3 What is the successful landing rate over the time period?
We began by adding all
interdictions and landings to the Graph and viewing the events in Palantir’s
Histogram. The Histogram reveals that there are 917 events, 441 of which are
landings and the rest of which are interdictions. Assuming that there are no
vessels lost at sea [i.e., every voyage ends in a logged landing or
interdiction], the overall landing success rate is 48.09%. We can also find
much more granular success rates with ease. Activating a temporal filter on the
Timeline (for example, March 2006) will highlight and Histogram only events
during that time period. Thus, we can see that there are five landings and
thirteen interdictions that month for a landing rate rate of slightly less than
28%. The yearly landing rates are roughly 32%, 39%, and 53% for 2005, 2006, and
2007 respectively—a seemingly steady rise that is much more erratic in reality.
We also exported the data from Palantir Government and imported into Palantir
Finance for advanced time series analysis as seen in figure 1. It appears that
the rustic arrival rate is the most stable, and the go fast rate is the most
erratic over time. Figure 1: Month-by-month chart of landing success rates in Palantir Finance Figure 2: An example of the one-month temporal filters
used to create the chart above |
|||||